A Study of Translation Error Rate with Targeted Human Annotation

نویسندگان

  • Matthew Snover
  • Bonnie Dorr
  • Richard Schwartz
  • John Makhoul
  • Linnea Micciulla
  • Ralph Weischedel
چکیده

We define a new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Error Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We also compute a human-targeted TER (or HTER), where the minimum TER of the translation is computed against a human ‘targeted reference’ that preserves the meaning (provided by the reference translations) and is fluent, but is chosen to minimize the TER score for a particular system output. We show that: (1) The single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU; (2) The human-targeted HTER yields a 33% error-rate reduction and is shown to be very well correlated with human judgments; (3) The four-reference variant of TER and the single-reference variant of HTER yield higher correlations with human judgments than BLEU; (4) HTER yields higher correlations with human judgments than METEOR or its human-targeted variant (HMETEOR); and (5) The four-reference variant of TER correlates as well with a single human judgment as a second human judgment does, while HTER, HBLEU, and HMETEOR correlate significantly better with a human judgment than a second human judgment does. This work has been supported, in part, by BBNT contract number 9500006806. 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Translation Edit Rate with Targeted Human Annotation

We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-refere...

متن کامل

Investigation of human error by using THERP method in control room of incoiler department in a pipe manufacturing company

Background & Aims of the Study: Today, in many sensitive occupational environments, human error can lead to catastrophic events. Given that the sensitive task of a control area operator, which in the occurrence of malfunction or failure leads to irreparable events, it is important to predict human errors to reduce its adverse consequences. Therefore, the present study was  perform by aiming to ...

متن کامل

Translation and Psychometric Assessment of the Persian Version of Patient Trust in Midwifery Care Scale

Background: Patients’ trust in their physicians can affect therapeutic outcomes. Measurement of patient’s trust levels is a helpful approach for policymakers in healthcare systems. Aim: The present study was targeted toward the translation and psychometric assessment of patients’ trust in midwifery care questionnaire. Method: This cross-sectional study was conducted on 210 female patients refer...

متن کامل

Evaluation of the relationship between the uses of safety procedures in the rate of human error in Yazd Combined Cycle Power Plant

Introduction: About 60 to 90 percent of an accident in the industry is caused by human error. This study aimed to assess the effectiveness of safety procedures in reducing human error in Yazd Combined Cycle Power Plant employees.   Materials and Methods: The present study is a quasi-experimental intervention that was conducted aimed to measure the human error of 121 employees of Yazd Combined...

متن کامل

تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور

The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005